{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Description:\n",
    "> 天池o2o优惠券使用预测比赛解析（初级）\n",
    ">> * 这是一个新手入门项目初级，通过这个项目，主要是熟悉比赛的流程和比赛的数据分析方式，与平时的机器学习的数据不同，比赛的数据都是来自现实生活中，数据预处理方面需要下一些功夫，所以下面就来学习一下，感受一下比赛的氛围。\n",
    "\n",
    ">>**赛题链接：**[天池o2o优惠券使用预测](https://tianchi.aliyun.com/getStart/introduction.htm?spm=5176.100066.0.0.518433afBqXIKM&raceId=231593)\n",
    ">>\n",
    ">> 里面用到的一些新的函数和数据的处理技巧，会在最后做一下总结。\n",
    "\n",
    ">> 在有道云笔记中，也记录了一些笔记，主要是数据集的信息和赛题背景的了解。 \n",
    "[笔记链接](http://note.youdao.com/noteshare?id=1943b28cfa29694ed2c4c74793806c91&sub=A62BFD3FF3424414A7FCBBF767F2623B)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  1. 导入用到的包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import libraries necessary for this project\n",
    "import os, sys, pickle\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "from datetime import date\n",
    "\n",
    "from sklearn.model_selection import KFold, train_test_split, StratifiedKFold, cross_val_score, GridSearchCV\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.linear_model import SGDClassifier, LogisticRegression\n",
    "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n",
    "from sklearn.metrics import log_loss, roc_auc_score, auc, roc_curve\n",
    "\n",
    "# display for this notebook\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 导入数据 \n",
    "> keep_default_na=False 这个参数的作用是决定要不要保留默认应该转换的缺失值列表，将这个参数设为False之后同时不定义na_values参数，就可以在读取文件时不将任何值转换为缺失值NaN。 pd的read_csv读取文件的时候，如果里面有缺失值，会自动处理成NaN，如果不想自动处理，就写上这个参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Merchant_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Discount_rate</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>null</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>20160217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1439408</td>\n",
       "      <td>4663</td>\n",
       "      <td>11002</td>\n",
       "      <td>150:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160528</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>1078</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160319</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160613</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   User_id  Merchant_id Coupon_id Discount_rate Distance Date_received  \\\n",
       "0  1439408         2632      null          null        0          null   \n",
       "1  1439408         4663     11002        150:20        1      20160528   \n",
       "2  1439408         2632      8591          20:1        0      20160217   \n",
       "3  1439408         2632      1078          20:1        0      20160319   \n",
       "4  1439408         2632      8591          20:1        0      20160613   \n",
       "\n",
       "       Date  \n",
       "0  20160217  \n",
       "1      null  \n",
       "2      null  \n",
       "3      null  \n",
       "4      null  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfoff = pd.read_csv('dataset/ccf_offline_stage1_train.csv', keep_default_na=False) \n",
    "dfon = pd.read_csv('dataset/ccf_online_stage1_train.csv', keep_default_na=False)\n",
    "dftest = pd.read_csv('dataset/ccf_offline_stage1_test_revised.csv', keep_default_na=False)\n",
    "\n",
    "# 简单的查看一下\n",
    "# print(dfoff.shape)   # 1754884, 7\n",
    "dfoff.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. 简单的数据分析\n",
    "> 统计一下用户使用哦优惠券的情况, 一共三种：\n",
    ">> * Date_received != null && Date != null  --- 有优惠券，购买商品\n",
    ">> * Date_received == null && Date != null  --- 无优惠券， 购买商品\n",
    ">> * Date_received != null && Date == null --- 有优惠券， 未购商品\n",
    ">> * Date_received == null && Date == null --- 无优惠券，未购商品"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "有优惠卷，购买商品：75382\n",
      "有优惠卷，未购商品：977900\n",
      "无优惠卷，购买商品：701602\n",
      "无优惠卷，未购商品：0\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'可见，很多人（701602）购买商品却没有使用优惠券，也有很多人（977900）有优惠券但却没有使用，真正使用优惠券购买商品的人（75382）很少！\\n所以，这个比赛的意义就是把优惠券送给真正可能会购买商品的人。'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print('有优惠卷，购买商品：%d' % dfoff[(dfoff['Date_received'] != 'null') & (dfoff['Date'] != 'null')].shape[0])\n",
    "print('有优惠卷，未购商品：%d' % dfoff[(dfoff['Date_received'] != 'null') & (dfoff['Date'] == 'null')].shape[0])\n",
    "print('无优惠卷，购买商品：%d' % dfoff[(dfoff['Date_received'] == 'null') & (dfoff['Date'] != 'null')].shape[0])\n",
    "print('无优惠卷，未购商品：%d' % dfoff[(dfoff['Date_received'] == 'null') & (dfoff['Date'] == 'null')].shape[0])\n",
    "\n",
    "\"\"\"可见，很多人（701602）购买商品却没有使用优惠券，也有很多人（977900）有优惠券但却没有使用，真正使用优惠券购买商品的人（75382）很少！\n",
    "所以，这个比赛的意义就是把优惠券送给真正可能会购买商品的人。\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. 特征提取\n",
    "> 从获取到的数据中可以发现， 这些数据杂乱无章，毫无规律可言，并且很多都是null， 所以我们需要从这里面获取可用的信息进行特征的\n",
    "提取， 提取出的特征才能后面放入模型中进行预测， 对了，这是一个分类的问题， 最后预测的是用户使用优惠券的概率。\n",
    ">\n",
    "> 特征提取在比赛中很重要，这个不像简单的机器学习那样，给定划分好特征和标签的数据集，这里而是没有特征和标签，都需要自己划分。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Merchant_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Discount_rate</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>null</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>20160217</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1439408</td>\n",
       "      <td>4663</td>\n",
       "      <td>11002</td>\n",
       "      <td>150:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160528</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>1078</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160319</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160613</td>\n",
       "      <td>null</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   User_id  Merchant_id Coupon_id Discount_rate Distance Date_received  \\\n",
       "0  1439408         2632      null          null        0          null   \n",
       "1  1439408         4663     11002        150:20        1      20160528   \n",
       "2  1439408         2632      8591          20:1        0      20160217   \n",
       "3  1439408         2632      1078          20:1        0      20160319   \n",
       "4  1439408         2632      8591          20:1        0      20160613   \n",
       "\n",
       "       Date  \n",
       "0  20160217  \n",
       "1      null  \n",
       "2      null  \n",
       "3      null  \n",
       "4      null  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看一下数据，看看有没有特征能够提取\n",
    "deoff.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.1 打折率 Discount_rate\n",
    "> 从上面的数据可以看到， 既然是优惠券， 用户肯定最关心的就是折扣率了， 也就是从Discount_rate这一列中获取一些信息<br>\n",
    ">\n",
    "> 我们先来查看一下这里面是什么信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Discount_rate 类型：\n",
      " ['null' '150:20' '20:1' '200:20' '30:5' '50:10' '10:5' '100:10' '200:30'\n",
      " '20:5' '30:10' '50:5' '150:10' '100:30' '200:50' '100:50' '300:30'\n",
      " '50:20' '0.9' '10:1' '30:1' '0.95' '100:5' '5:1' '100:20' '0.8' '50:1'\n",
      " '200:10' '300:20' '100:1' '150:30' '300:50' '20:10' '0.85' '0.6' '150:50'\n",
      " '0.75' '0.5' '200:5' '0.7' '30:20' '300:10' '0.2' '50:30' '200:100'\n",
      " '150:5']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'可以发现，这里面的类型好多种形式，我们需要做的就是要划分开，转换成相同的格式'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print('Discount_rate 类型：\\n',dfoff['Discount_rate'].unique())\n",
    "\n",
    "\"\"\"可以发现，这里面的类型好多种形式，我们需要做的就是要划分开，转换成相同的格式\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 打折率分为三种情况：\n",
    ">> * \"null\" 表示没有打折\n",
    ">> * [0, 1] 表示折扣率\n",
    ">> * x:y 表示满x减y\n",
    ">\n",
    "> 我们的处理方式：\n",
    ">> * 首先判断出是上面的哪种打折类型 ，用函数getDiscountType()\n",
    ">> * 然后把所有的都统一用折扣率的形式表示， 用函数convertRate()\n",
    ">> * 对于第三种情况， 我们要分别获取到满多少（getDiscountMan()）和减多少（getDiscountJian()）\n",
    ">\n",
    "> 下面分别定义这几个函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Deal With Discount_rate\n",
    "def getDiscountType(row):\n",
    "    if row == 'null':\n",
    "        return 'null'\n",
    "    elif ':' in row:\n",
    "        return 1\n",
    "    else:\n",
    "        return 0\n",
    "\n",
    "def convertRate(row):\n",
    "    if row == 'null':\n",
    "        return 1.0\n",
    "    elif ':' in row:\n",
    "        rows = row.split(':')\n",
    "        return 1- float(rows[1]) / float(rows[0])\n",
    "    else:\n",
    "        return 0\n",
    "\n",
    "def getDiscountMan(row):\n",
    "    if ':' in row:\n",
    "        return row.split(':')[0]\n",
    "        \n",
    "    else:\n",
    "        return 0\n",
    "\n",
    "def getDiscountJian(row):\n",
    "    if ':' in row:\n",
    "        return row.split(':')[1]\n",
    "    else:\n",
    "        return 0\n",
    "    \n",
    "# 处理数据\n",
    "def processData(df):\n",
    "    df['discount_type'] = df['Discount_rate'].apply(getDiscountType)\n",
    "    df['discount_rate'] = df['Discount_rate'].apply(convertRate)\n",
    "    df['discount_man'] = df['Discount_rate'].apply(getDiscountMan)\n",
    "    df['discount_jian'] = df['Discount_rate'].apply(getDiscountJian)\n",
    "\n",
    "    print(df['discount_rate'].unique())\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1.         0.86666667 0.95       0.9        0.83333333 0.8\n",
      " 0.5        0.85       0.75       0.66666667 0.93333333 0.7\n",
      " 0.6        0.         0.96666667 0.98       0.99       0.975\n",
      " 0.33333333 0.4       ]\n",
      "[0.83333333 0.9        0.96666667 0.8        0.95       0.75\n",
      " 0.         0.98       0.5        0.86666667 0.6        0.66666667\n",
      " 0.7        0.85       0.33333333 0.94       0.93333333 0.975\n",
      " 0.99      ]\n"
     ]
    }
   ],
   "source": [
    "# 现在看一下数据\n",
    "dfoff = processData(dfoff)\n",
    "dftest = processData(dftest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Merchant_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Discount_rate</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Date</th>\n",
       "      <th>discount_type</th>\n",
       "      <th>discount_rate</th>\n",
       "      <th>discount_man</th>\n",
       "      <th>discount_jian</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>null</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1439408</td>\n",
       "      <td>4663</td>\n",
       "      <td>11002</td>\n",
       "      <td>150:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160528</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>150</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>1078</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160319</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160613</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   User_id  Merchant_id Coupon_id Discount_rate Distance Date_received  \\\n",
       "0  1439408         2632      null          null        0          null   \n",
       "1  1439408         4663     11002        150:20        1      20160528   \n",
       "2  1439408         2632      8591          20:1        0      20160217   \n",
       "3  1439408         2632      1078          20:1        0      20160319   \n",
       "4  1439408         2632      8591          20:1        0      20160613   \n",
       "\n",
       "       Date discount_type  discount_rate discount_man discount_jian  \n",
       "0  20160217          null       1.000000            0             0  \n",
       "1      null             1       0.866667          150            20  \n",
       "2      null             1       0.950000           20             1  \n",
       "3      null             1       0.950000           20             1  \n",
       "4      null             1       0.950000           20             1  "
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfoff.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.2 距离 Distance\n",
    "> 对于用户购物来讲， 距离也是一个影响的因素， 离得近和远有关系\n",
    ">\n",
    "> 下面先看一下dfoff['Distance']有什么值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distance类型:  ['0' '1' 'null' '2' '10' '4' '7' '9' '3' '5' '6' '8']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'下面要做的处理，就是把Distance的类型转换成整数，把null替换成-1，这样都是整数就好进行下一步处理了'"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Distance类型: \", deoff['Distance'].unique())\n",
    "\n",
    "\"\"\"下面要做的处理，就是把Distance的类型转换成整数，把null替换成-1，这样都是整数就好进行下一步处理了\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0  1 -1  2 10  4  7  9  3  5  6  8]\n",
      "[ 1 -1  5  2  0 10  3  6  7  4  9  8]\n"
     ]
    }
   ],
   "source": [
    "# convert Distance\n",
    "dfoff['Distance'] = dfoff['Distance'].replace('null', -1).astype(int)\n",
    "dftest['Distance'] = dftest['Distance'].replace('null', -1).astype(int)   #  astype数据类型转换函数\n",
    "\n",
    "print(dfoff['Distance'].unique())\n",
    "print(dftest['Distance'].unique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Merchant_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Discount_rate</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Date</th>\n",
       "      <th>discount_type</th>\n",
       "      <th>discount_rate</th>\n",
       "      <th>discount_man</th>\n",
       "      <th>discount_jian</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>null</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1439408</td>\n",
       "      <td>4663</td>\n",
       "      <td>11002</td>\n",
       "      <td>150:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160528</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>150</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>1078</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160319</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160613</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   User_id  Merchant_id Coupon_id Discount_rate  Distance Date_received  \\\n",
       "0  1439408         2632      null          null         0          null   \n",
       "1  1439408         4663     11002        150:20         1      20160528   \n",
       "2  1439408         2632      8591          20:1         0      20160217   \n",
       "3  1439408         2632      1078          20:1         0      20160319   \n",
       "4  1439408         2632      8591          20:1         0      20160613   \n",
       "\n",
       "       Date discount_type  discount_rate discount_man discount_jian  \n",
       "0  20160217          null       1.000000            0             0  \n",
       "1      null             1       0.866667          150            20  \n",
       "2      null             1       0.950000           20             1  \n",
       "3      null             1       0.950000           20             1  \n",
       "4      null             1       0.950000           20             1  "
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 下面看看数据\n",
    "dfoff.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.3 领劵日期 Date_received\n",
    "> 人们在周末的时候领券的几率会大一些， 有时间买点东西<br>\n",
    "> 这个要注意一下，题目中要预测优惠券在15天之内使用的概率，所以需要对这个日期进行一个限制<br>\n",
    "> 这里先查看一下"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "优惠卷收到日期从 20160101 到 20160615\n",
      "消费日期从 20160101 到 20160630\n"
     ]
    }
   ],
   "source": [
    "date_received = dfoff['Date_received'].unique()\n",
    "date_received = sorted(date_received[date_received != 'null'])\n",
    "\n",
    "date_buy = dfoff['Date'].unique()\n",
    "date_buy = sorted(date_buy[date_buy != 'null'])\n",
    "\n",
    "print('优惠卷收到日期从',date_received[0],'到',date_received[-1])\n",
    "print('消费日期从',date_buy[0],'到',date_buy[-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 关于领券日期的特征：\n",
    ">> * weekday: {null, 1, 2, 3, 4, 5, 6, 7}\n",
    ">> * weekday_type: {1, 0}  (周末是1， 其余时间时0）\n",
    "\n",
    "> 转换成独热编码的形式\n",
    ">> * weekday_1: {1, 0, 0, 0, 0, 0, 0}\n",
    ">> * weekday_2: {0, 1, 0, 0, 0, 0, 0}\n",
    ">> * weekday_3: {0, 0, 1, 0, 0, 0, 0}\n",
    ">> * weekday_4: {0, 0, 0, 1, 0, 0, 0}\n",
    ">> * weekday_5: {0, 0, 0, 0, 1, 0, 0}\n",
    ">> * weekday_6: {0, 0, 0, 0, 0, 1, 0}\n",
    ">> * weekday_7: {0, 0, 0, 0, 0, 0, 1}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "def getWeekday(row):\n",
    "    if row == 'null':\n",
    "        return row\n",
    "    else:\n",
    "        return date(int(row[0:4]), int(row[4:6]), int(row[6:8])).weekday() + 1   # .weekday()可以获取到一周中的第几天\n",
    "\n",
    "dfoff['weekday'] = dfoff['Date_received'].astype(str).apply(getWeekday)\n",
    "dftest['weekday'] = dftest['Date_received'].astype(str).apply(getWeekday)\n",
    "\n",
    "#  weekday_type :  周六和周日为1，其他为0\n",
    "dfoff['weekday_type'] = dfoff['weekday'].apply(lambda x: 1 if x in [6, 7] else 0)\n",
    "dftest['weekday_type'] = dfoff['weekday'].apply(lambda x: 1 if x in [6, 7] else 0)\n",
    "\n",
    "# change weekday to one-hot encoding \n",
    "weekdaycols = ['weekday_' + str(i) for i in range(1, 8)]\n",
    "\n",
    "tmpdf = pd.get_dummies(dfoff['weekday'].replace('null', np.nan))\n",
    "#print(tmpdf)\n",
    "tmpdf.columns = weekdaycols\n",
    "dfoff[weekdaycols] = tmpdf\n",
    "\n",
    "tmpdf = pd.get_dummies(dftest['weekday'].replace('null', np.nan))\n",
    "tmpdf.columns = weekdaycols\n",
    "dftest[weekdaycols] = tmpdf\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Merchant_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Discount_rate</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Date</th>\n",
       "      <th>discount_type</th>\n",
       "      <th>discount_rate</th>\n",
       "      <th>discount_man</th>\n",
       "      <th>discount_jian</th>\n",
       "      <th>weekday</th>\n",
       "      <th>weekday_type</th>\n",
       "      <th>weekday_1</th>\n",
       "      <th>weekday_2</th>\n",
       "      <th>weekday_3</th>\n",
       "      <th>weekday_4</th>\n",
       "      <th>weekday_5</th>\n",
       "      <th>weekday_6</th>\n",
       "      <th>weekday_7</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>null</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1439408</td>\n",
       "      <td>4663</td>\n",
       "      <td>11002</td>\n",
       "      <td>150:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160528</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>150</td>\n",
       "      <td>20</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>1078</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160319</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160613</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>null</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>20160516</td>\n",
       "      <td>null</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160516</td>\n",
       "      <td>20160613</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1832624</td>\n",
       "      <td>3381</td>\n",
       "      <td>7610</td>\n",
       "      <td>200:20</td>\n",
       "      <td>0</td>\n",
       "      <td>20160429</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>200</td>\n",
       "      <td>20</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2029232</td>\n",
       "      <td>3381</td>\n",
       "      <td>11951</td>\n",
       "      <td>200:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160129</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>200</td>\n",
       "      <td>20</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2029232</td>\n",
       "      <td>450</td>\n",
       "      <td>1532</td>\n",
       "      <td>30:5</td>\n",
       "      <td>0</td>\n",
       "      <td>20160530</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.833333</td>\n",
       "      <td>30</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   User_id  Merchant_id Coupon_id Discount_rate  Distance Date_received  \\\n",
       "0  1439408         2632      null          null         0          null   \n",
       "1  1439408         4663     11002        150:20         1      20160528   \n",
       "2  1439408         2632      8591          20:1         0      20160217   \n",
       "3  1439408         2632      1078          20:1         0      20160319   \n",
       "4  1439408         2632      8591          20:1         0      20160613   \n",
       "5  1439408         2632      null          null         0          null   \n",
       "6  1439408         2632      8591          20:1         0      20160516   \n",
       "7  1832624         3381      7610        200:20         0      20160429   \n",
       "8  2029232         3381     11951        200:20         1      20160129   \n",
       "9  2029232          450      1532          30:5         0      20160530   \n",
       "\n",
       "       Date discount_type  discount_rate discount_man discount_jian weekday  \\\n",
       "0  20160217          null       1.000000            0             0    null   \n",
       "1      null             1       0.866667          150            20       6   \n",
       "2      null             1       0.950000           20             1       3   \n",
       "3      null             1       0.950000           20             1       6   \n",
       "4      null             1       0.950000           20             1       1   \n",
       "5  20160516          null       1.000000            0             0    null   \n",
       "6  20160613             1       0.950000           20             1       1   \n",
       "7      null             1       0.900000          200            20       5   \n",
       "8      null             1       0.900000          200            20       5   \n",
       "9      null             1       0.833333           30             5       1   \n",
       "\n",
       "   weekday_type  weekday_1  weekday_2  weekday_3  weekday_4  weekday_5  \\\n",
       "0             0          0          0          0          0          0   \n",
       "1             1          0          0          0          0          0   \n",
       "2             0          0          0          1          0          0   \n",
       "3             1          0          0          0          0          0   \n",
       "4             0          1          0          0          0          0   \n",
       "5             0          0          0          0          0          0   \n",
       "6             0          1          0          0          0          0   \n",
       "7             0          0          0          0          0          1   \n",
       "8             0          0          0          0          0          1   \n",
       "9             0          1          0          0          0          0   \n",
       "\n",
       "   weekday_6  weekday_7  \n",
       "0          0          0  \n",
       "1          1          0  \n",
       "2          0          0  \n",
       "3          1          0  \n",
       "4          0          0  \n",
       "5          0          0  \n",
       "6          0          0  \n",
       "7          0          0  \n",
       "8          0          0  \n",
       "9          0          0  "
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfoff.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">好了，现在梳理一下我们提取出的特征\n",
    "> * discount_rate\n",
    "> * discount_type\n",
    "> * discount_man\n",
    "> * discount_jian\n",
    "> * distance\n",
    "> * weekday\n",
    "> * weekday_type\n",
    "> * weekday_1\n",
    "> * weekday_2\n",
    "> * weekday_3\n",
    "> * weekday_4\n",
    "> * weekday_5\n",
    "> * weekday_6\n",
    "> * weekday_7"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. 标签标注\n",
    "> 有了上面的特征，我们接下来梳理一下标签， 因为我们现在还没有y\n",
    ">\n",
    "> 想一下， 我们这时候的标签应该怎么标注呢？  \n",
    ">> 应该从我们的目标开始，我们是想干啥，预测用户15天内使用优惠券的概率哈， 那我们的正样本应该是有优惠券并且15天内使用吧。 <br>\n",
    ">> 所以分下面的三种情况：\n",
    ">> - Date_received == 'null'：表示没有领到优惠券，无需考虑，y = -1\n",
    ">>- (Date_received != 'null') & (Date != 'null') & (Date - Date_received <= 15)：表示领取优惠券且在15天内使用，即正样本，y = 1\n",
    ">>- (Date_received != 'null') & ((Date == 'null') | (Date - Date_received > 15))：表示领取优惠券未在在15天内使用，即负样本，y = 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义标注函数\n",
    "def label(row):\n",
    "    if row['Date_received'] == 'null':\n",
    "        return -1\n",
    "    if row['Date'] != 'null':\n",
    "        td = pd.to_datetime(row['Date'], format='%Y%m%d') - pd.to_datetime(row['Date_received'], format='%Y%m%d')\n",
    "        if td <= pd.Timedelta(15, 'D'):            # Timedelta：实现datetime加减\n",
    "            return 1\n",
    "    return 0\n",
    "\n",
    "dfoff['label'] = dfoff.apply(label, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 0    988887\n",
      "-1    701602\n",
      " 1     64395\n",
      "Name: label, dtype: int64\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Merchant_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Discount_rate</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Date</th>\n",
       "      <th>discount_type</th>\n",
       "      <th>discount_rate</th>\n",
       "      <th>discount_man</th>\n",
       "      <th>...</th>\n",
       "      <th>weekday</th>\n",
       "      <th>weekday_type</th>\n",
       "      <th>weekday_1</th>\n",
       "      <th>weekday_2</th>\n",
       "      <th>weekday_3</th>\n",
       "      <th>weekday_4</th>\n",
       "      <th>weekday_5</th>\n",
       "      <th>weekday_6</th>\n",
       "      <th>weekday_7</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>null</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>null</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>null</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1439408</td>\n",
       "      <td>4663</td>\n",
       "      <td>11002</td>\n",
       "      <td>150:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160528</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>150</td>\n",
       "      <td>...</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160217</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>...</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>1078</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160319</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>...</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160613</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   User_id  Merchant_id Coupon_id Discount_rate  Distance Date_received  \\\n",
       "0  1439408         2632      null          null         0          null   \n",
       "1  1439408         4663     11002        150:20         1      20160528   \n",
       "2  1439408         2632      8591          20:1         0      20160217   \n",
       "3  1439408         2632      1078          20:1         0      20160319   \n",
       "4  1439408         2632      8591          20:1         0      20160613   \n",
       "\n",
       "       Date discount_type  discount_rate discount_man  ... weekday  \\\n",
       "0  20160217          null       1.000000            0  ...    null   \n",
       "1      null             1       0.866667          150  ...       6   \n",
       "2      null             1       0.950000           20  ...       3   \n",
       "3      null             1       0.950000           20  ...       6   \n",
       "4      null             1       0.950000           20  ...       1   \n",
       "\n",
       "  weekday_type  weekday_1  weekday_2  weekday_3  weekday_4  weekday_5  \\\n",
       "0            0          0          0          0          0          0   \n",
       "1            1          0          0          0          0          0   \n",
       "2            0          0          0          1          0          0   \n",
       "3            1          0          0          0          0          0   \n",
       "4            0          1          0          0          0          0   \n",
       "\n",
       "   weekday_6  weekday_7  label  \n",
       "0          0          0     -1  \n",
       "1          1          0      0  \n",
       "2          0          0      0  \n",
       "3          1          0      0  \n",
       "4          0          0      0  \n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(dfoff['label'].value_counts())\n",
    "dfoff.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. 建立模型\n",
    "> 有了上面的标签和数据，接下来就可以建立模型了，我们先从SGDClassifier开始\n",
    ">> - 使用上面提取的14个特征。\n",
    ">> - 训练集：20160101-20160515；验证集：20160516-20160615。\n",
    ">> - 用线性模型 SGDClassifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6.1 划分训练集和测试集 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train Set: \n",
      " 0    759172\n",
      "1     41524\n",
      "Name: label, dtype: int64\n",
      "\n",
      "Valid Set: \n",
      " 0    229715\n",
      "1     22871\n",
      "Name: label, dtype: int64\n",
      "(800696, 21)\n"
     ]
    }
   ],
   "source": [
    "# data split\n",
    "df = dfoff[dfoff['label']!=-1].copy()\n",
    "train = df[(df['Date_received'] < '20160516')].copy()\n",
    "valid = df[(df['Date_received'] >= '20160516') & (df['Date_received'] <= '20160615')].copy()\n",
    "\n",
    "print('Train Set: \\n', train['label'].value_counts())\n",
    "print('\\nValid Set: \\n', valid['label'].value_counts())\n",
    "print(train.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6.2 特征数量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "共有14个特征\n",
      "['discount_rate', 'discount_type', 'discount_man', 'discount_jian', 'Distance', 'weekday', 'weekday_type', 'weekday_1', 'weekday_2', 'weekday_3', 'weekday_4', 'weekday_5', 'weekday_6', 'weekday_7']\n"
     ]
    }
   ],
   "source": [
    "# feature\n",
    "original_feature = ['discount_rate','discount_type','discount_man', 'discount_jian','Distance', 'weekday', 'weekday_type'] + weekdaycols \n",
    "print(\"共有{}个特征\".format(len(original_feature)))\n",
    "print(original_feature)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6.3 建立模型 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [],
   "source": [
    "def check_model(data, predictors):\n",
    "    \n",
    "    classifier = lambda:SGDClassifier(\n",
    "        loss = 'log',            #  loss function: logistic regression\n",
    "        penalty = 'elasticnet',   # L1正则化和L2正则化的组合   (1 - l1_ratio) * L2 + l1_ratio * L1\n",
    "        fit_intercept=True,  # 是否存在截距，默认存在\n",
    "        max_iter=100, \n",
    "        shuffle=True,  # Whether or not the training data should be shuffled after each epoch\n",
    "        n_jobs=1, # The number of processors to use\n",
    "        class_weight=None # Weights associated with classes. If not given, all classes are supposed to have weight one.\n",
    "     )    \n",
    "# 管道机制使得参数集在新数据集（比如测试集）上的重复使用，管道机制实现了对全部步骤的流式化封装和管理。\n",
    "    model = Pipeline(steps=[\n",
    "        ('ss', StandardScaler()), # transformer\n",
    "        ('en', classifier())  # estimator\n",
    "    ])\n",
    "    \n",
    "    param_grid = {\n",
    "        'en__alpha': [ 0.001, 0.01, 0.1],\n",
    "        'en__l1_ratio': [ 0.001, 0.01, 0.1]\n",
    "    }\n",
    "    \n",
    "# StratifiedKFold用法类似Kfold，但是他是分层采样，确保训练集，测试集中各类别样本的比例与原始数据集中相同。\n",
    "    folder = StratifiedKFold(n_splits=3, shuffle=True)\n",
    "# Exhaustive search over specified parameter values for an estimator.\n",
    "    grid_search = GridSearchCV(\n",
    "        model, \n",
    "        param_grid, \n",
    "        cv=folder, \n",
    "        n_jobs=-1,  # -1 means using all processors\n",
    "        verbose=1)\n",
    "    grid_search = grid_search.fit(data[predictors], \n",
    "                                  data['label'])\n",
    "    \n",
    "    return grid_search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6.4 训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 3 folds for each of 9 candidates, totalling 27 fits\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.\n",
      "[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:   24.8s finished\n"
     ]
    }
   ],
   "source": [
    "predictors = original_feature\n",
    "model = check_model(train, original_feature)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  6.5 验证模型\n",
    "> 对验证集中每个优惠券预测的结果计算 AUC，再对所有优惠券的 AUC 求平均。计算 AUC 的时候，如果 label 只有一类，就直接跳过，因为 AUC 无法计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Merchant_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Discount_rate</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Date</th>\n",
       "      <th>discount_type</th>\n",
       "      <th>discount_rate</th>\n",
       "      <th>discount_man</th>\n",
       "      <th>...</th>\n",
       "      <th>weekday_type</th>\n",
       "      <th>weekday_1</th>\n",
       "      <th>weekday_2</th>\n",
       "      <th>weekday_3</th>\n",
       "      <th>weekday_4</th>\n",
       "      <th>weekday_5</th>\n",
       "      <th>weekday_6</th>\n",
       "      <th>weekday_7</th>\n",
       "      <th>label</th>\n",
       "      <th>pred_prob</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1439408</td>\n",
       "      <td>4663</td>\n",
       "      <td>11002</td>\n",
       "      <td>150:20</td>\n",
       "      <td>1</td>\n",
       "      <td>20160528</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>150</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.020414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160613</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.104395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1439408</td>\n",
       "      <td>2632</td>\n",
       "      <td>8591</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160516</td>\n",
       "      <td>20160613</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.104395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2029232</td>\n",
       "      <td>450</td>\n",
       "      <td>1532</td>\n",
       "      <td>30:5</td>\n",
       "      <td>0</td>\n",
       "      <td>20160530</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.833333</td>\n",
       "      <td>30</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.098702</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2029232</td>\n",
       "      <td>6459</td>\n",
       "      <td>12737</td>\n",
       "      <td>20:1</td>\n",
       "      <td>0</td>\n",
       "      <td>20160519</td>\n",
       "      <td>null</td>\n",
       "      <td>1</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>20</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.131515</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 22 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    User_id  Merchant_id Coupon_id Discount_rate  Distance Date_received  \\\n",
       "1   1439408         4663     11002        150:20         1      20160528   \n",
       "4   1439408         2632      8591          20:1         0      20160613   \n",
       "6   1439408         2632      8591          20:1         0      20160516   \n",
       "9   2029232          450      1532          30:5         0      20160530   \n",
       "10  2029232         6459     12737          20:1         0      20160519   \n",
       "\n",
       "        Date discount_type  discount_rate discount_man  ... weekday_type  \\\n",
       "1       null             1       0.866667          150  ...            1   \n",
       "4       null             1       0.950000           20  ...            0   \n",
       "6   20160613             1       0.950000           20  ...            0   \n",
       "9       null             1       0.833333           30  ...            0   \n",
       "10      null             1       0.950000           20  ...            0   \n",
       "\n",
       "   weekday_1  weekday_2  weekday_3  weekday_4  weekday_5  weekday_6  \\\n",
       "1          0          0          0          0          0          1   \n",
       "4          1          0          0          0          0          0   \n",
       "6          1          0          0          0          0          0   \n",
       "9          1          0          0          0          0          0   \n",
       "10         0          0          0          1          0          0   \n",
       "\n",
       "    weekday_7  label  pred_prob  \n",
       "1           0      0   0.020414  \n",
       "4           0      0   0.104395  \n",
       "6           0      0   0.104395  \n",
       "9           0      0   0.098702  \n",
       "10          0      0   0.131515  \n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# valid predict\n",
    "# print(valid.head(5))\n",
    "y_valid_pred = model.predict_proba(valid[predictors])\n",
    "valid1 = valid.copy()\n",
    "valid1['pred_prob'] = y_valid_pred[:, 1]    #  1表示正样本的概率\n",
    "valid1.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 计算AUC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "平均AUC值:  0.5307477107962685\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAugAAAHwCAYAAAD0N5r7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAWJQAAFiUBSVIk8AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nOzdfXhU1b33/883CRgg4SFE4gMigUmEVtAjQtBpVUyLtj0ePdRi7ypV+mi1hZ+1PbfW3sfau62e0x4VrJ4+qPVorUX7o9pnbVOtbawJoggqmASCoGDoEIEkJkAy6/4jkyEJSWYy2ZnZM3m/rotrZ/baa+3vzGj4uF17bXPOCQAAAIA/ZKW6AAAAAABHENABAAAAHyGgAwAAAD5CQAcAAAB8hIAOAAAA+AgBHQAAAPARAjoAAADgIwR0AAAAwEcI6AAAAICPENABAAAAHyGgAwAAAD5CQAcAAAB8JCfVBSSTmdVLGi9pe4pLAQAAQGabLumAc654sB1HVECXNH7MmDEFs2fPLkh1IQAAAMhcmzdvVmtra0J9R1pA3z579uyC9evXp7oOAAAAZLB58+bpxRdf3J5IX+agAwAAAD5CQAcAAAB8hIAOAAAA+AgBHQAAAPARAjoAAADgIwR0AAAAwEcI6AAAAICPENABAAAAHyGgAwAAAD5CQAcAAAB8JCfVBQAAkAlqGppUWRdSc1u78nJzFAwUqrQo3/M+yawvHWTqZ5gO31c6fB7p8Dn2hYAOAMAQVNaFtKqiVtX1jUe1LSgu0MryEgUDhUPuk8z60kGmfobp8H2lw+eRDp/jQMw5N/RBzC6VdK6k0yWdJilf0sPOuSsSGGuqpG9KulDSZEm7JT0u6Rbn3DtDrHP9GWecccb69euHMgwAAJKkNet26Ma1mxQe4K/SLJNuWzJXS+eflHCfZNaXDjL1M0yH7ysdPg+/fI7z5s3Tiy+++KJzbt5g+3o1B/3rkr6ozoD+VqKDmNlMSeslLZdULekOSdskrZT0dzObPPRSAQAYusq6UMwQIElhJ92wdqMq60IJ9UlmfekgUz/DdPi+0uHzSIfPMR5eBfTrJJVKGi/pC0MY5x5JUyStcM5d4py7wTl3vjqD+imSvj3kSgEA8MCqitqYIaBL2EmrK2oT6pPM+tJBpn6G6fB9pcPnkQ6fYzw8mYPunHu662czS2gMM5shabGk7ZLu7tV8s6TPSVpmZtc751oSqxQAgKGraWjqc27rQKoGeXxXn827Dwz6prZE60vkXMmUzPeVqedKVDp8Hn94ZXdC/Woamnz3z72fbhI9P7J9yjkX7t7gnGsys0p1BviFkioGGsjM+ptkPmvIVQIARrxk/m/xD636a0aeK5ky9TNMh+8rmTVe/dMXE+pXWRfyXUD30zrop0S2Nf20d/0/iNIk1AIAQL+a29pTXQIAj/jx32c/XUGfENnu76e9a//EWAP1d7ds5Mr6GYMvDQCAI/Jyk/fXp0ka7OxR56RE1mhL5FzJlMz3lannSlQ6fB6JSua/z/HyX0X96/p6k/mdAQBwlGSun/zkdeckNId38R3PJuVcyZTM95Wp50pUOnweP1o2T597aPBLaftxPXQ/TXHpukI+oZ/28b2OAwAgJUqL8jWjcNyg+pQVF2hBccGg+yQSwEqL8pN2rmRK5vvK1HMlKh0+j8XvPc73n2O8/BTQX49s+5tjXhLZ9jdHHQCApPj1y7tUH4p/QbEsk1aUl2hleYmy4vzf/V19EpXMcyVTpn6G6fB9pcPnkQ6fYzz8FNC7lmpcbGY96jKzfElBSa2Snk92YQAAdPnjaw26bs2GuOdbdj2xMBgoVDBQqFuXzIkZILr3SVQyz5VMmfoZpsP3lQ6fRzp8jvFI+hx0Mxslaaakw865rV37nXNbzewpdS6leK2ku7p1u0XSOEk/ZA10AECqPFvzD1378ItqjzwJJTAlT9d/sFQPPLe9z3XOy4oLtKK8pEcIuGz+NE2dNFarK2rj7pOoZJ4rmTL1M0yH7ysdPo90+BxjMeeGfs+lmV0i6ZLIy+MkXSBpm6SuxS9DzrmvRI6dLqle0hvOuem9xpkp6Tl1Pk30CUmbJZVJWqTOqS1nO+f2DqHO9WecccYZ69cP/gYCAMDIVrVtr678SbXaDnc+quPkyWP12OfP0pTxuZI6b2yrrAupua1debk5CgYKY85tTaRPopJ5rmTK1M8wHb6vdPg8Uvk5zps3Ty+++OKL/a0uOBCvAvo31Pm0z/5Ew/hAAT3SfpKkb0q6UNJkSbslPS7pFufc4B/D1nNsAjoAYNBe2vGOrri3Si2HOiRJJ0zI1aNXn6Wpk8amuDIAfjWUgO7JFBfn3DckfSPOY7fryJKJfbXvlLTci7oAABiqV3ft15X3V0fD+bH5x+hnn11IOAcwbPx0kygAAL5S29CkZfdV60DkSYMF40brZ58p0/RBLrEIAINBQAcAoA/bQy26/N4qNbYckiSNz83Rg59aoBKfzQMGkHkI6AAA9PLWvlZdfm+V9jQdlCSNG52tBz61QKee2N+z9ADAOwR0AAC62XOgTZf/+Hm9ta9VknRMTpbuu2q+zpg2KcWVARgpCOgAAETsbT6oy++t0va970qSRmdn6UefPFMLZ0xOcWUARhICOgAAkva3HtYn769W7Z5mSVJ2lun7n/gnnVt6bIorAzDSENABACNe88F2XfWTar2664AkyUy6felpWvze41JcGYCRiIAOABjR2g536DP/s04v7dgX3fcfS+bq4tNPTGFVAEYyAjoAYMQ62N6hzz+0Xs9vO/Kg6lv+5b1aOv+kFFYFYKQjoAMARqTDHWF96Wcv6S81/4juu+FDs3Tl2dNTVxQAiIAOABiBOsJO1z/6sp56rSG6b0V5ia4+d2YKqwKATgR0AMCIEg47fW3tJv3q5V3RfZ99f7Gu+0BJCqsCgCNyUl0AAADDpaahSZV1ITW3tSsvN0dnz5ysn1Xt0JoXdkaPuWLhNH3tw7NlZimsFACOIKADADJOZV1IqypqVV3fOOBxl86bqm/+y6mEcwC+QkAHAGSUNet26Ma1mxR2Ax83d+oE/cdH5yori3AOwF+Ygw4AyBiVdaG4wrkkvfLWfj2/be/wFwUAg0RABwBkjFUVtXGFc0kKO2l1Re3wFgQACSCgAwAyQk1DU8w5571V1TeqpqFpmCoCgMQQ0AEAGaGyLpTUfgAwXAjoAICM0NzWntR+ADBcCOgAgIyQl5vYwmSJ9gOA4UJABwBkhGCgMKn9AGC4ENABABmhtChfC4oLBtWnrLhApUX5w1QRACSGgA4AyBgry0sU73OHskxaUV4yvAUBQAII6ACAjBEMFOrWJXNihvQsk25bMpfpLQB8iTtjAAAZ5bL50zR10litrqhVVR/ropcVF2hFeQnhHIBvEdABABknGChUMFCoWV//vdraw5KkGz80S4tmTWHOOQDfI6ADADJWVre5LlcsPFnjjuGvPQD+xxx0AAAAwEcI6AAAAICPENABAAAAHyGgAwAAAD5CQAcAAAB8hIAOAAAA+AgBHQAAAPARAjoAAADgIwR0AAAAwEcI6AAAAICPENABAAAAHyGgAwAAAD6Sk+oCAADpqaahSZV1ITW3tSsvN0fBQKFKi/J9da5w2EV//unzb2jRrCnDViMAeIWADgAYlMq6kFZV1Kq6vvGotgXFBVpZXqJgoDCl5+rq19Yeju679fdbdOvvt3heIwB4jSkuAIC4rVm3Q8vuq+ozMEtSdX2jlt1XpUfX7UzZuZJZIwAMBwI6ACAulXUh3bh2k7rNGulT2Ek3rN2oyrpQ0s+VzBoBYLgQ0AEAcVlVURsz+HYJO2l1RW3Sz5XMGgFguDAHHQAQU01DU79TRvpTVd+or/7iZRWMGz2ofo0thxI61+ceeiGhfjUNTdw4CsBXCOgAgJgSnQry2AtvelxJ/556tSGhfpV1IQI6AF9higsAIKbmtvZUlzBsMvm9AUhPXEEHAMSUl5vYXxfls6bozOkFg+rzwvZGVWzZM+hzlU7JU82e5kH3S/S9AcBw4bcSACCmRNcM/98fmjXo6SM1DVMSCuhfueAUfe6h9YPux3roAPyGKS4AgJhKi/K1oHhwV8LLigsSmtud6LkWv/e4pNUIAMOJgA4AiMvK8hJlWXzHZpm0orwk6edKZo0AMFwI6ACAuAQDhbp1yZyYATjLpNuWzB3S1JFEz5XMGgFguDAHHQAQt8vmT9PUSWO1uqJWVX2sOV5WXKAV5SWeBN9Ez5XMGgFgOBDQAQCDEgwUKhgo1Fd/8XJ0nfPyWVMSuiE03nPVNDSpsi6k5rZ25eXmKBgoHPBcifYDAD8goAMAEtL9CaFnTh/emy1Li/ITvuGUQA4g3TAHHQAAAPARAjoAAADgIwR0AAAAwEcI6AAAAICPENABAAAAHyGgAwAAAD5CQAcAAAB8hIAOAAAA+IhnAd3MpprZ/Wa2y8wOmtl2M7vTzCYNcpz3mdkTkf5tZrbDzH5nZhd6VSsAAADgV54EdDObKWm9pOWSqiXdIWmbpJWS/m5mk+Mc5wuS/iqpPLK9Q9JfJJ0r6fdmdpMX9QIAAAB+lePROPdImiJphXPurq6dZna7pOskfVvS1QMNYGajJN0qqU3SPOfc693aviPpJUk3mdn3nHMHPaobAAAA8JUhX0E3sxmSFkvaLunuXs03S2qRtMzMxsUYqkDSBEk13cO5JDnnNkuqkTRGUt5QawYAAAD8yospLudHtk8558LdG5xzTZIqJY2VtDDGOHsk/UNSqZmVdG8ws1JJJZI2OOf2elAzAAAA4EteTHE5JbKt6ae9Vp1X2EslVfQ3iHPOmdm1kn4qab2Z/VLSLkknSvpXSa9K+ng8BZnZ+n6aZsXTHwAAAEgVLwL6hMh2fz/tXfsnxhrIOfeYme2S9IikT3ZrapD0E3XeeAoAAABkrGSsg26RrYt5oNkVkv6kzhVcZqtzasxsdV55/76kn8dzQufcvL7+SNqSyBsAAAAAksWLgN51hXxCP+3jex3Xp8g88/vVOZVlmXNui3Ou1Tm3RdIydS7j+DEzO2/oJQMAAAD+5EVA71pxpbSf9q4bPvubo95lsaRRkv7Sx82mYUnPRl7OS6RIAAAAIB14EdCfjmwXm1mP8cwsX1JQUquk52OMc0xke2w/7V37DyVSJAAAAJAOhhzQnXNbJT0labqka3s13yJpnKQHnXMtXTvNbJaZ9V5R5a+R7aVmNrd7g5mdLulSdc5j//NQawYAAAD8yqsniV4j6TlJq82sXNJmSWWSFqlzastNvY7fHNl23UAq51y1mf1E0nJJ6yLLLL6hzuB/iaTRku50zr3qUc0AAACA73gS0J1zW83sTEnflHShpA9L2i1ptaRbnHONcQ71aXXONb9K0gWS8iUdkPQ3ST92zsW1igsAAACQrry6gi7n3E51Xv2O51jrZ7+T9EDkDwAAADDiJGMddAAAAABxIqADAAAAPkJABwAAAHyEgA4AAAD4CAEdAAAA8BECOgAAAOAjBHQAAADARwjoAAAAgI8Q0AEAAAAfIaADAAAAPkJABwAAAHyEgA4AAAD4CAEdAAAA8BECOgAAAOAjBHQAAADARwjoAAAAgI8Q0AEAAAAfyUl1AQAA79Q0NKmyLqTmtnbl5eYoGChUaVH+sJyrseVQ9OcXtjeqpmHKsJ0LAEYSAjoAZIDKupBWVdSqur7xqLYFxQVaWV6iYKBw2M5VsWWPKrbs8fxcADASMcUFANLcmnU7tOy+qj7DuSRV1zdq2X1VenTdzrQ6FwCMVAR0AEhjlXUh3bh2k8Ju4OPCTrph7UZV1oXS4lwAMJIR0AEgja2qqI0ZmLuEnbS6ojYtzgUAIxlz0AEgTdU0NPU71aQ/VfWNenHHOwpMyRtUv7o9zQmdq6ahiRtHAWCQCOgAkKYSnUKy5J7nPK6kf5V1IQI6AAwSU1wAIE01t7WnuoSY0qFGAPAbrqADQJrKy03sV/gx2VkanTO46zOH2sM62BEe9LkSrREARjJ+cwJAmjrtpIkJ9fv1ivcNetpJTUOTFt/x7KDPxXroADB4THEBgDRUtW2vvvSzlwbdr6y4IKE54aVF+VpQXJCUcwHASEdAB4A0crC9Q7f+brM+/uPn9da+1kH1zTJpRXlJwudeWV6iLEvOuQBgJCOgA0Ca2PL2AV38/Ur98NltcpH1yCeMGaVPlE2LGZyzTLptydwhTTkJBgp165I5STkXAIxkzEEHAJ8Lh53u+1u9vvvk6zrU7UbN95cU6ruXnqbjJuTqI3OO1+qKWlX1sVZ5WXGBVpSXeBKYL5s/TVMnjU3KuQBgpCKgA4CPvfnOu/rKYy/r+W1HwvAxOVn62odna9nCk5UVuZwdDBQqGChUTUOTKutCam5rV15ujoKBQs/ngSfzXAAwEhHQAcCHnHP65Utv6eYnXlXTwSNric85cYLuuOz0fp8EWlqUn7SQnMxzAcBIQkAHAJ95p+WQbnp8k3636e3oviyTvrgooC+Vl2hUNrcPAUAmI6ADgI888/oe/dsvNmpP08HovpMnj9XtS0/XvJMnpbAyAECyENABwAdaD3XoO7/brIeef6PH/v+1YJq+/pHZGncMv64BYKTgNz4ApNiGnfv05TUbtC3UEt1XmDda//HRuSqfXZTCygAAqUBAB4AUae8I6+6nt2r1n2vVEXbR/R98T5FuWzJHk/OOSWF1AIBUIaADQArUh1p03ZoN2rBzX3TfuNHZuvmi9+pjZ06VWZyP7AQAZBwCOgAkkXNOP6veoW/9ZrNaD3dE95958iTdvvR0TZs8NoXVAQD8gIAOAEmyp6lN//sXG/X06/+I7huVbbrug6X6/DkzlZ3FVXMAAAEdAJLiD6/s1o1rN+mddw9H95VMydMdl52uU0+ckMLKAAB+Q0AHgGHU1HZYt/z6Nf1i/Zs99n/6fcX66gWnKHdUdooqAwD4FQEdAIZJ1ba9+vKjL+utfa3RfcdPyNX3PnaagoHCFFYGAPAzAjoAeOxge4du/2ONfvTsNrkjqyfqktNP0C0Xn6oJY0alrjgAgO8R0AHAQ1vePqD/7+cbtOXtpui+8bk5+va/ztFFp52QwsoAAOmCgA4AcappaFJlXUjNbe3Ky81RMFCo0qJ8SVI47HTf3+r13Sdf16GOcLTP+wKF+u7H5ur4CWNSVTYAIM0Q0AEghsq6kFZV1Kq6vvGotgXFBbq8bJoeqd6h57cdaT8mJ0s3fmiWPnnWdGWxfCIAYBAI6AAwgDXrdujGtZsUdn23V9c3HhXcTz1xvO5YerpKIlfXAQAYDAI6APSjsi40YDjvzSRduyigFeUlGp2TNay1AQAyFwEdAPqxqqI27nAuSe85Yby+csEpw1cQAGBE4BIPAPShpqGpzznnA3l11wHVNDTFPhAAgAEQ0AGgD5V1oaT2AwCgCwEdAPrQ3Nae1H4AAHQhoANAH/JyE7tFJ9F+AAB0IaADQB+CgcKk9gMAoAsBHQD6UFqUrwXFBYPqU1ZcEH2yKAAAiSKgA0A/VpaXKN6HgGaZtKK8ZHgLAgCMCAR0AOhHMFCoW5fMiRnSs0y6bclcprcAADzB3UwAMIDL5k/T1EljtbqiVlV9rIteVlygFeUlhHMAgGcI6AAQQzBQqGCgUE+++rY+/9B6SdKx+aP18GcWMuccAOA5prgAQJymTx4X/XnS2NGEcwDAsCCgAwAAAD5CQAcAAAB8xLOAbmZTzex+M9tlZgfNbLuZ3WlmkxIYa46ZPWhmOyNj7TGzv5jZJ72qFwAAAPAjT24SNbOZkp6TNEXSE5K2SFogaaWkC80s6JzbG+dYV0m6V9K7kn4jabukiZJOlfRhSQ96UTMAAADgR16t4nKPOsP5CufcXV07zex2SddJ+rakq2MNYmYL1RnOX5F0oXPu7V7tozyqFwAAAPClIU9xMbMZkhar80r33b2ab5bUImmZmY1TbP8pKVvSFb3DuSQ55w4PrVoAAADA37y4gn5+ZPuUcy7cvcE512RmleoM8AslVfQ3iJlNlfR+SS9IetXMFkmaJ8lJ2iDp6d7jAwAAAJnGi4B+SmRb0097rToDeqkGCOiS5nc7/s+SzuvVvsnMljjn6mIVZGbr+2maFasvAAAAkEperOIyIbLd30971/6JMcaZEtkulTRb0pLI2AFJD0maI+m3ZjY68VIBAAAAf/PqJtGBWGTrYhyX3W37GefcbyKvD5jZleoM7WdK+qikRwYayDk3r89COq+snxFP0QAAAEAqeHEFvesK+YR+2sf3Oq4/70S2ByX9rnuDc86pc/lGqXP5RgAAACAjeRHQX49sS/tpL4ls+5uj3nucpn5uBu0K8GMGURsAAACQVrwI6E9HtovNrMd4ZpYvKSipVdLzMcbZKCkkqdDMivpoPzWy3Z54qQCQuO17W6I/v/PuIdU0NKWwGgBAphpyQHfObZX0lKTpkq7t1XyLpHGSHnTORf9mM7NZZtZjRRXnXLukH0Ze/mf3sG9mcyRdJald0i+GWjMADEZlXUhLf/h3ff6hIwtE/aPpkBbf8ayW/vDvqqwLpbA6AECm8eom0WskPSdptZmVS9osqUzSInVObbmp1/GbI1vrtf87ksolfVLSHDN7RtKx6rwxNFfS9fEsswgAXlmzboduXLtJ4X5uc6+ub9Sy+6p025K5Wjr/pOQWBwDISF5Mcem6in6mpAfUGcyvlzRT0mpJZznn9sY5zrvqDOi3SBqrzivy/6LO8P9h59ztXtQLAPGorAsNGM67hJ10w9qNXEkHAHjCs2UWnXM7JS2P89jeV867t70r6RuRPwCQMqsqamOG8y5hJ62uqFUwUDi8RQEAMp4nV9ABINPUNDSpur5xUH2q6hu5cRQAMGQEdADoQ6LTVZjmAgAYKgI6APShua09qf0AAOhCQAeAPuxrPZxQv7xcz27tAQCMUPxNAgDdbHn7gP7rqRr98bWGhPpzkygAYKgI6AAgaXuoRXf8qUa/enmXXJwrt/RWVlyg0qJ8bwsDAIw4BHQAI9qufa2668+1evSFN9XRa03FhTMmq7p+b1xLLWaZtKK8ZJiqBACMJAR0ACNSqPmg7nl6q35a9YYOtYd7tH1g9hR9+YOn6D0njI/5JFGpM5zftmQu01sAAJ4goAMYUfa3HtaPn92m+yvr9e6hjh5tZ8+crK9ccIrOmDYpuu+y+dM0ddJYra6oVVUf66KXFRdoRXkJ4RwA4BkCOoAR4d1D7Xrgue36wTNbdaDXUoinnzRRX73glH5DdjBQqGCgUDUNTaqsC6m5rV15uTkKBgqZcw4A8BwBHUBGO9jeoUeqduj7T29VqPlgj7ZZx+Xr+sWn6AOzp8jMYo5VWpRPIAcADDsCOoCM1N4R1toX39Kqilq9ta+1R9v0yWN13QdLddHcE5SVFTuYAwCQTAR0ABklHHb67abduuOPNdoWaunRdvyEXK0sL9FH503VqGye0wYA8CcCOoCM4JzT06/v0XefrNHm3Qd6tE0eN1rXLgroE2XTlDsqO0UVAgAQHwI6gLT396179d0nt+jFHft67M/PzdHnz5mh5cFijTuGX3cAgPTA31gA0taGnfv0vSdf19/qQj32jxmVreXB6fr8OTM1YeyoFFUHAEBiCOgA0s7rbzfpv556XU+91tBj/+jsLH2ibJquWTRTU/JzU1QdAABDQ0AHkDa2h1p0559q9MTLu+S6Pdkzy6SPzTtJKz5QohMnjkldgQAAeICADsD3du9v1eqKOj36wk51hF2Ptn+ee7yu+2CpZh6bl6LqAADwFgEdgG/tbT6oe57Zqoeef0OH2sM92spnTdGXF5fqvSdMSFF1AAAMDwI6AN/Z33pY9/51m+7/W71aDnX0aFs4o0BfvWCW5p08KUXVAQAwvAjoAHzj3UPteuC57frhX7Zpf+vhHm2nnTRRX118ioKByTLj6Z8AgMxFQAeQcgfbO/Tz6p266891CjUf7NF2SlG+rl9cqg++p4hgDgAYEQjoAAZU09CkyrqQmtvalZebo2CgUKVF+Z70ae8Ia+1Lb2nVn2r11r7WHm0nTx6rL3+wVP889wRlZxHMAQAjBwEdQJ8q60JaVVGr6vrGo9oWFBdoZXmJgoHChPqEw06/e2W3bv9jjbb9o6XHcceNz9XKD5To0nlTNSo7y9s3BQBAGiCgAzjKmnU7dOPaTeq1omFUdX2jlt1XpduWzNXS+ScNqs+VZ09X1bZGvbb7QI/2gnGjde2igC4vm6bcUdlevh0AANIKAR1AD5V1oQGDdpewk25Yu1EnTup8MFC8fX5Sub3HvvzcHH3u/TO0/H3FyjuGX0kAAPC3IYAeVlXUxgzaXcJOWl1RKxf5eTByR2VpebBYnz9nhiaOHT3oOgEAyFQEdABRNQ1Nfc4fH0jVII/v8pPl83XWjMLYBwIAMMJwBxaAqMq6UNLOtWV3U9LOBQBAOiGgA4hqbmvPyHMBAJBOCOgAovJykzfrLZnnAgAgnRDQAUT1Xtc8U84FAEA6IaADiCotyteC4oJB9SkrLkioT6ynkQIAMFIR0AH0sLK8RFkW37FZJq0oL0moDwAA6BsBHUAPwUChbl0yJ2bgzjLptiVzFQwUJtQHAAD0jbu0ABzlsvnTNHXSWK2uqO1znfOy4gKtKC/pEbQT6QMAAI5GQAfQp64r4+d/7xltC7VIkj53zgxdOm9qv/PHu/rUNDSpsi6k5rZ25eXmKBgoZM45AABxIqADGFDuqOzoz/9y2glxBe3SonwCOQAACWIOOgAAAOAjBHQAAADARwjoAAAAgI8Q0AEAAAAfIaADAAAAPkJABwAAAHyEgA4AAAD4CAEdAAAA8BECOgAAAOAjBHQAAADARwjoAAAAgI8Q0AEAAAAfIaADGFDb4Y7oz796eZdqGppSWA0AAJkvJ9UFAPCnyrqQVlXUaluoJbrvR89u04+e3aYFxQVaWV6iYKAwhRUCAJCZuIIO4Chr1u3QsvuqVF3f2Gd7dX2jlt1XpUfX7UxyZQAAZD4COoAeKutCunHtJoXdwMeFnXTD2o2qrAslpzAAAEYIAjqAHlZV1MYM513CTlpdUTu8BQEAMMIQ0AFE1TQ09TutpT9V9Y3cOAoAgI2tyogAACAASURBVIcI6ACiEp2uwjQXAAC8Q0AHENXc1p7UfgAA4GgEdABRebmJrbyaaD8AAHA0AjqAqETXNWc9dAAAvENABxBVWpSvGYXjBtWnrLhApUX5w1QRAAAjDwEdQNRvN+5Wfbcnh8aSZdKK8pJhrAgAgJGHgA5AkvSn1xq08ucvKc4l0JVl0m1L5jK9BQAAj3kW0M1sqpndb2a7zOygmW03szvNbNIQxjzHzDrMzJnZt7yqFUBPf6sN6ZqHX1R75AlFM48dp3suP0NlxQV9Hl9WXKCHPl2mpfNPSmaZAACMCJ4svWBmMyU9J2mKpCckbZG0QNJKSReaWdA5t3eQY+ZL+h9J70rK86JOAEerrm/UZx98QYc6wpKkaQVj9fBnFuq4Cbn68JzjVdPQpMq6kJrb2pWXm6NgoJA55wAADCOv1ka7R53hfIVz7q6unWZ2u6TrJH1b0tWDHHOVpAmSbo30B+CxDTv36VMPrFPr4Q5J0gkTcvXwZ8p03ITc6DGlRfkEcgAAkmjIU1zMbIakxZK2S7q7V/PNklokLTOzuJeGMLOLJS2XtELSrqHWCOBom3cf0JX3V6v5YOdDho7NP0YPf3ahTioYm+LKAAAY2byYg35+ZPuUcy7cvcE51ySpUtJYSQvjGczMpkj6saTHnXM/9aA+AL3U7WnWFfdWaX/rYUnSpLGj9PBnylQ8yCUWAQCA97yY4nJKZFvTT3utOq+wl0qqiGO8H6nzPxwGOyUmyszW99M0K9ExgUzxxt4WXX7v89rbckiSlJ+bo4c+XcY0FgAAfMKLgD4hst3fT3vX/omxBjKzT0m6WNJlzrkGD2oD0M2ufa36xI+r1HDgoCRp7OhsPbB8gU49cUKMngAAIFm8ukl0IBbZDri8splNl3SnpMecc48O5YTOuXn9nGO9pDOGMjaQrvY0tenye6v01r5WSdIxOVm698ozNe/khFdCBQAAw8CLOehdV8j7uwQ3vtdx/blfUqukazyoCUA3jS2HtOze6uhTQkdlm364bJ7OnslDhgAA8BsvAvrrkW1pP+1dzwHvb456lzPUuVTjPyIPJnJm5iT9JNJ+U2Tf40MrFxhZ9rce1ifvr9LrDU2SpOws013/6wydd8qUFFcGAAD64sUUl6cj28VmltV9JZfIw4aC6rwy/nyMcR5U52ovvZVIOkfSBknrJb005IqBEaLlYLuW/6Rar7x1QJJkJt2+9DRdeOpxKa4MAAD0Z8gB3Tm31cyeUudKLddKuqtb8y2Sxkn6oXOupWunmc2K9N3SbZwVfY1vZlepM6D/1jn39aHWC4wUbYc79Jn/eUEv7tgX3Xfbkjm6+PQTU1gVAACIxaubRK+R9Jyk1WZWLmmzpDJJi9Q5teWmXsdvjmxNADx3sL1DV/90vf6+bW903zcueo8umz8thVUBAIB4eDEHXc65rZLOlPSAOoP59ZJmSlot6Szn3N7+ewPwUntHWCsf2aBnXv9HdN+/XXiKrgoWp7AqAAAQL8+WWXTO7ZS0PM5j475y7px7QJ3BH0AMHWGnrzz2sv7w6tvRfSvOD+ia8wIprAoAAAyGJ1fQAaSec043/XKTHt+wK7rvM+8r1nUf7G+BJQAA4EfJeFARkDI1DU2qrAupua1debk5CgYKM+KR9r3f19kzJ+uR6p36+bqd0WMuL5ummz4yW2bc6gEAQDohoCMjVdaFtKqiVtX1jUe1LSgu0MryEgUD6feQnoHeV3dLzjhR//fiUwnnAACkIaa4IOOsWbdDy+6r6jfEVtc3atl9VXq029XmdBDrfXWZe+IE/edH5yori3AOAEA6IqAjo1TWhXTj2k0Ku4GPCzvphrUbVVkXSk5hQxTv+5KkV3btV1WMEA8AAPyLgI6MsqqiNq4QK3WG9NUVtcNbkEcy9X0BAICjMQcdGaOmoSnm9I/equobtfA7FRqd49//Vj3UHtbbB9oG1aeqvlE1DU0ZcUMsAAAjDQEdGSPR6SqDDb/porIuREAHACAN+feyITBIzW3tqS7BV/g8AABIT1xBR8bIy03sH+cvnR/QpfOmelyNd36x/k3d9ee6QfdL9PMAAACpxd/gyBiJrmt+0Wkn6OTJ4zyuxjsXnXZCQgE9Hdd5BwAATHFBBiktyteC4oJB9SkrLvD9PO1MfV8AAKBvBHRklJXlJYr38TxZJq0oLxnWeryysrxE8T53KJ3eFwAAOBoBHRnljGmTNHZ0dszjsky6bcnctJkGEgwU6tYlc2KG9HR7XwAA4GjMQUdGWbNuh1oOdUiSRmebDnUc/XSfsuICrSgvSbsQe9n8aZo6aaxWV9T2+aTQdH1fAACgJwI6Msah9rB++Oy26OubPvIenTVzsirrQmpua1debo6CgcK0npsdDBQqGChUTUNTRr0vAABwBAEdGePxl97S7v2dDx0qzButy+afpNxR2RkZXEuL8jPyfQEAAOagI0N0hJ3++y9bo68//b4Zyh0Vey46AACA3xDQkRF+u2m36kMtkqTxuTm6YuG0FFcEAACQGAI60p5zTvc8feRBPledPV35uaNSWBEAAEDiCOhIexWb92jL202SpLGjs7U8WJziigAAABJHQEdac87p+92unl9eNk2Txo1OYUUAAABDQ0BHWvv71r3asHOfJGl0dpY+8/4ZKa4IAABgaAjoSGvdr55/7MypKhqfm8JqAAAAho6AjrT14o539NzWvZKk7CzT1efOTHFFAAAAQ0dAR9rqvnLLxaedoJMKxqawGgAAAG8Q0JGWNu8+oD9t3iNJMpOuWcTVcwAAkBkI6EhLd3e7en7Be45TYAqPvQcAAJmBgI60s+0fzfrtpt3R19cuCqSwGgAAAG/lpLoAIB41DU2qrAupua1dFVsa5Fzn/nNLj9WcqRNSWxwAAICHCOjwtcq6kFZV1Kq6vrHP9nNKjk1yRQAAAMOLgA7fWrNuh25cu0lh1/8x3/7da8rPzdHS+SclrzAAAIBhxBx0+FJlXShmOJeksJNuWLtRlXWh5BQGAAAwzAjo8KVVFbUxw3mXsJNWV9QOb0EAAABJQkCH79Q0NPU757w/VfWNqmloGqaKAAAAkoeADt9JdLoK01wAAEAmIKDDd5rb2pPaDwAAwE8I6PCdvNzEFhdKtB8AAICfENDhK+Gw056mtoT6BgOFHlcDAACQfFxyhG/s3t+qrz62UX9LYC55WXGBSovyh6EqAACA5CKgwxd+9fIuff2Xm3QggXnkWSatKC8ZhqoAAACSj4COlNr/7mH9nyde0a9e3hXdZyZdfe5MnThxjP79iVcGXA89y6TblsxlegsAAMgYBHSkzN9qQ/rKYy/r7QNH5pxPnTRGty89XQuKCyRJxYXjtLqiVlV9rIteVlygFeUlhHMAAJBRCOhIurbDHfqPP2zRTyq399i/9Myp+j///B7l546K7gsGChUMFKqmoUmVdSE1t7UrLzdHwUAhc84BAEBGIqAjqV55a7+uW7NBtXuao/sKxo3Wd/51ji489bh++5UW5RPIAQDAiEBAR1J0hJ1+8JetuvNPNTrccWRS+fmzpui2j87RlPzcFFYHAADgHwR0DLsde9/Vlx/doBfeeCe6b8yobH39n2frEwumycxSWB0AAIC/ENAxbJxzevSFnfrmr19Ty6GO6P7TT5qoOy47XcWF41JYHQAAgD8R0DEsQs0HdePaTfrjaw3RfdlZppXlJbrmvJnKyeYhtgAAAH0hoMNzf3qtQTes3ahQ86HovhnHjtMdS0/XaSdNTGFlAAAA/kdAh2daDrbrW799TY9U7+yx/8qzTtYNH5qtMaOzU1QZAABA+iCgwxPr33hHX350g97Y+25035T8Y/Tdj52mc0uPTWFlAAAA6YWAjiE53BHW6opa3f10ncJHVk/UR+Ycr29dcqomjRuduuIAAADSEAEdCavb06Tr1rysTW/tj+7Lz83R/734VF18+gksnwgAAJAAAjoGLRx2evDv23Xr77foYHs4uv+sGZP1vaWn6cSJY1JXHAAAQJojoGNQ3t7fpq/+4mX9tTYU3Tc6O0v/duEp+lSwWFlZXDUHAAAYCgI64vabjbt00y9f0f7Ww9F9s47L150fP12zjhufwsoAAAAyBwE9w9Q0NKmyLqTmtnbl5eYoGChUaVH+kPrsbz2sm594RY9v2BXdZyZ97pwZ+vIHS3VMDssnAgAAeIWAniEq60JaVVGr6vrGo9oWFBdoZXmJgoHCQfcxSdc/9rJ272+Ltp04cYxuX3qaymZM9vx9AAAAjHQE9AywZt0O3bh2U49lDrurrm/UsvuqdNuSuVo6/6S4+1xxb5V6N186b6puvug9ys8d5d0bAAAAQBQBPc1V1oUGDNpdwk66Ye1GnTipc4WVePp0b540dpRuXTJHF556/NAKBgAAwIAI6GluVUVtzKDdJeyk1RW1cpGf4zVxzCg9ed05mpKfm1CNAAAAiB8BPY3VNDT1OX98IFWDPF6S9rUe1r53DxPQAQAAkiAr1QUgcZV1odgHpeG5AAAARjLPArqZTTWz+81sl5kdNLPtZnanmU2Ks/84M7vczH5mZlvMrMXMmszsBTO73sxGe1Vrpmhua8/IcwEAAIxknkxxMbOZkp6TNEXSE5K2SFogaaWkC80s6JzbG2OY90v6qaRGSU9LelxSgaSLJH1P0hIzK3fOtfU/xMiSl5u8GUrJPBcAAMBI5lXquked4XyFc+6urp1mdruk6yR9W9LVMcZ4W9IVkh5zzh3qNka+pGcknS3pWkn/5VHNaa/3uuaZci4AAICRbMhTXMxshqTFkrZLurtX882SWiQtM7NxA43jnNvgnHu4eziP7G/SkVB+3lDrzSSlRflaUFwwqD5lxQUJ9Yn1NFIAAAB4w4s56OdHtk8558LdGyLhulLSWEkLh3COw5EtE6F7WVleoiyL79gsk1aUlyTUBwAAAMnhRUA/JbKt6ae9NrItHcI5PhXZ/iGeg81sfV9/JM0aQg2+FAwU6tYlc2IG7iyTblsyV8FAYUJ9AAAAkBxezEGfENnu76e9a//ERAY3sy9KulDSBkn3JzJGprts/jRNnTRWqytq+1znvKy4QCvKS3oE7UT6AAAAYPglY2mOruu0g3h2ZaSj2RJJd6rzBtKPOucOx+jSeSLn5vUz3npJZwy2jnTQdWX86ofW6w+vvi1Jumju8fpSeUm/88e7+tQ0NKmyLqTmtnbl5eYoGChkzjkAAECKeBHQu66QT+infXyv4+JiZpdI+rmkPZIWOee2JVbeyDJhzKjoz/EG7dKifAI5AACAT3gxB/31yLa/OeZddxj2N0f9KGb2MUmPSWqQdK5z7vUYXQAAAICM4EVAfzqyXWxmPcaLrGEelNQq6fl4BjOzT0h6RNIudYbz2hhdAAAAgIwx5IDunNsq6SlJ09X5IKHubpE0TtKDzrmWrp1mNsvMjlpRxcyulPSQpB2SzmFaCwAAAEYar24SvUbSc5JWm1m5pM2SyiQtUufUlpt6Hb85so0u9Gdmi9S5SkuWOq/KLzc7ah3Afc65Oz2qGQAAAPAdTwK6c26rmZ0p6ZvqXBLxw5J2S1ot6Rbn3NHr+B3tZB25ov+pfo55Q52rugAAAAAZybNlFp1zOyUtj/PYoy6NO+cekPSAV/UAAAAA6ciLm0QBAAAAeISADgAAAPgIAR0AAADwEQI6AAAA4CMEdAAAAMBHCOgAAACAjxDQM8z+1sPRnyvrQqppaEphNQAAABgsz9ZBR2pV1oW0qqJW1fVHngn164279euNu7WguEAry0sUDBSmsEIAAADEgyvoGWDNuh1adl9Vj3DeXXV9o5bdV6VH1+1McmUAAAAYLAJ6mqusC+nGtZsUdgMfF3bSDWs3qrIulJzCAAAAkBACeppbVVEbM5x3CTtpdUXt8BYEAACAISGgp7GahqZ+p7X0p6q+kRtHAQAAfIyAnsYSna7CNBcAAAD/IqCnsea29qT2AwAAwPAjoKexvNzEVslMtB8AAACGHwE9jSW6rjnroQMAAPgXAT2NlRbla0FxwaD6lBUXqLQof5gqAgAAwFAR0NPcyvISZVl8x2aZtKK8ZHgLAgAAwJAQ0NNcMFCoz50zI+ZxWSbdtmQu01sAAAB8jrsFM8Bruwde17ysuEAryksI5wAAAGmAgJ7mNr65T8/W/ENS51Xye6+crzf2tqi5rV15uTkKBgqZcw4AAJBGCOhp7p6nt0Z//sjcE3T+rCkprAYAAABDxRz0NFbb0KQ/vPp29PW1i2amsBoAAAB4gYCexv77mSNXzz8wu0izjhufwmoAAADgBQJ6mtqx91098fKu6GuungMAAGQGAnqa+sGzW9URdpKkYGCy/mnapBRXBAAAAC8Q0NNQw4E2/eKFN6Ovrz0vkMJqAAAA4CUCehr68bPbdKgjLEn6p2kTddbMySmuCAAAAF4hoKeZxpZDerhqR/T1FxcFZGYprAgAAABeIqCnmQcq69V6uEOSNOu4fNY9BwAAyDAE9DTS1HZYDzy3Pfr6Wq6eAwAAZBwCehp56Pk3dKCtXZJUXDhOH55zfIorAgAAgNcI6Gmi9VCH7vtrffT1F86dqewsrp4DAABkGgJ6mlizbof2thySJJ0wIVeX/NOJKa4IAAAAw4GAngYOtYf1o2e3RV9/7pwZGp3DVwcAAJCJSHlp4PGX3tKu/W2SpMK80fr4gmkprggAAADDhYDucx1hp//+y9bo60+/b4ZyR2WnsCIAAAAMJwK6z/1u027Vh1okSeNzc3TFQq6eAwAAZDICuo8553T303XR11edPV35uaNSWBEAAACGGwHdxyo279GWt5skSWNHZ2t5sDjFFQEAAGC4EdB9yjmn73e7ev6JBdM0adzoFFYEAACAZCCg+9Tft+7Vhp37JEmjs7P02XNmpLgiAAAAJAMB3afufubI1fNLz5yqovG5KawGAAAAyUJA96GXdryjyrq9kqTsLNMXzp2Z4ooAAACQLAR0H+q+csvFp52gkwrGprAaAAAAJBMB3Wc27z6gP23eE339hfO4eg4AADCSENB95p5njjw19ML3HqeSovwUVgMAAIBkI6D7SH2oRb/duCv6+tpFgRRWAwAAgFQgoPvID57ZqrDr/Pmc0mM1Z+qE1BYEAACApCOg+8Sufa1a+9Kb0ddf5Oo5AADAiERA94kfPbtNhzs6L5/Pnz5JC4oLUlwRAAAAUoGA7gOh5oP6+bod0dfMPQcAABi5COg+cN/f6tV2OCxJOvXE8Tq39NgUVwQAAIBUIaCn2P7Ww3ro729EX197XkBmlsKKAAAAkEoE9BR78Lntaj7YLkkKTMnTBe89LsUVAQAAIJUI6CnUcrBd91fWR19fc95MZWVx9RwAAGAkI6Cn0CPVO/TOu4clSVMnjdFFp52Q4ooAAACQagT0FDnY3qEf/3Vb9PXV587UqGy+DgAAgJGORJgi///6t9Rw4KAkaUr+Mbp03tQUVwQAAAA/IKCnQHtHWD/4y9bo68++f4ZyR2WnsCIAAAD4BQE9BX69cZd2NL4rSZo4dpQ+UTYtxRUBAADALwjoSRYOO93z9JGr58vPLta4Y3JSWBEAAAD8hICeZE+91qDaPc2SpHGjs3XV2dNTWxAAAAB8hUu3SVDT0KTKupCa2tr1SPWO6P4rzjpZE8aOSmFlAAAA8BvPArqZTZX0TUkXSposabekxyXd4px7ZxDjFEj6d0mXSDpe0l5Jf5D07865N72qNxkq60JaVVGr6vrGo9pM0twTJya/KAAAAPiaJwHdzGZKek7SFElPSNoiaYGklZIuNLOgc25vHONMjoxTKunPkn4uaZak5ZI+YmZnOee2DTCEb6xZt0M3rt2ksOu73Un60iMvquXgXC2df1JSawMAAIB/eTUH/R51hvMVzrlLnHM3OOfOl3SHpFMkfTvOcb6jznB+h3OuPDLOJeoM+lMi5/G9yrrQgOG8S9hJN6zdqMq6UHIKAwAAgO8NOaCb2QxJiyVtl3R3r+abJbVIWmZm42KMM07SssjxN/dq/n5k/Asi5/O1VRW1McN5l7CTVlfUDm9BAAAASBteXEE/P7J9yjkX7t7gnGuSVClprKSFMcY5S9IYSZWRft3HCUt6KvJy0ZArHkY1DU19zjkfSFV9o2oammIfCAAAgIznxRz0UyLbmn7aa9V5hb1UUsUQx1FknAGZ2fp+mmbF6jtUiU5XqawLqbQo3+NqAAAAkG68uII+IbLd30971/5YS5Z4NU5KNbe1J7UfAAAAMksy1kG3yDbOWdlDH8c5N6/PATqvrJ8xxDoGlJeb2EeaaD8AAABkFi+uoHdd2Z7QT/v4XscN9zgpFQwUJrUfAAAAMosXAf31yLa/ueElkW1/c8u9HielSovytaC4YFB9yooLmH8OAAAASd4E9Kcj28Vm1mM8M8uXFJTUKun5GOM8HzkuGOnXfZwsdd5o2v18vrWyvERZFvs4ScoyaUV5SewDAQAAMCIMOaA757aqcwnE6ZKu7dV8i6Rxkh50zrV07TSzWWbWY0UV51yzpIcix3+j1zhfjIz/ZDo8STQYKNStS+bEDOlZJt22ZC7TWwAAABDl1Z2J10h6TtJqMyuXtFlSmTrXLK+RdFOv4zdHtr0j7NcknSfpy2Z2uqRqSbMlXSxpj47+DwDfumz+NE2dNFarK2pV1ce66GXFBVpRXkI4BwAAQA+eBHTn3FYzO1PSNyVdKOnDknZLWi3pFudcXE/ucc7tNbOz1Pkk0UskvV/SXkk/kfTvzrk3vag3WYKBQgUDhappaFJlXUjNbe3Ky81RMFDInHMAAAD0ybO1/ZxzOyUtj/PYfid/RML8ysifjFBalE8gBwAAQFy8uEkUAAAAgEcI6AAAAICPENABAAAAHyGgAwAAAD5CQAcAAAB8hIAOAAAA+AgBHQAAAPARAjoAAADgIwR0AAAAwEcI6AAAAICPmHMu1TUkjZntHTNmTMHs2bNTXQoAAAAy2ObNm9Xa2tronJs82L4jLaDXSxovaXsKTj8rst2SgnMjefieMx/f8cjA9zwy8D2PDKn6nqdLOuCcKx5sxxEV0FPJzNZLknNuXqprwfDhe858fMcjA9/zyMD3PDKk4/fMHHQAAADARwjoAAAAgI8Q0AEAAAAfIaADAAAAPkJABwAAAHyEVVwAAAAAH+EKOgAAAOAjBHQAAADARwjoAAAAgI8Q0AEAAAAfIaADAAAAPkJABwAAAHyEgA4AAAD4CAE9QWY21czuN7NdZnbQzLab2Z1mNmmQ4xRE+m2PjLMrMu7U4aod8Rvq92xm48zscjP7mZltMbMWM2sysxfM7HozGz3c7wGxefXvc68xzzGzDjNzZvYtL+tFYrz8ns1sjpk9aGY7I2PtMbO/mNknh6N2xMfDv5vfZ2ZPRPq3mdkOM/udmV04XLUjPmZ2qZndZWZ/NbMDkd+xP01wLM9/93uFBxUlwMxmSnpO0hRJT0jaImmBpEWSXpcUdM7tjWOcyZFxSiX9WdI6SbMkXSxpj6SznHPbhuM9IDYvvufIL/PfS2qU9LSkOkkFki6SdFxk/HLnXNswvQ3E4NW/z73GzJe0UVKhpDxJ33bOfd3LujE4Xn7PZnaVpHslvSvpN5K2S5oo6VRJu5xzH/e4fMTBw7+bvyDpHkktkn4p6U1JUyUtkTRW0tedc98ejveA2Mxsg6TTJDWr87uZJelh59wVgxzH89/9nnLO8WeQfyQ9KclJ+lKv/bdH9v8gznF+GDn+9l77V0T2/yHV73Uk//Hie5Z0uqTLJY3utT9f0vrIONen+r2O5D9e/fvcq+/96vyPsq9FxvhWqt/nSP/j4e/thZLaJW2Q/l979xsyWVUHcPz729zEFTNMV4OCJdtHg7aiqEzRdgu2ehEurVshiZv4IlKCyHe9qCAIKmKDelOw0VYUbeAGGbWkpSJItCRL0mpIblGbf0jNwkrZXy/OeWgYZnx2Zs6dex+e7wcuh5k7c+ZcfnPv/c2559zhkgnrN/e9rRt1aXTM3gw8DTwHXDa27nXAvyk/zM7ue3s36kJJoLcDAeyssf1uH9+XLhd70GcUEa8BHqH0mFyamadH1p0HnKJ8abZm5r9epJ5zgSeA08ArM/PZkXWb6mdsq59hL/qStYrzGp9xPfA94CeZ+f6FG62ZdRHniLgWOALcAJwFfAt70HvVMs4RcQ9wNbAjM3/XWaM1k4bn5ouBvwHHM/ONE9YfB3YAF2afvasCICJ2Uq5Oz9SDvoxz/KIcgz67d9Xy6GhAAWqSfR/lEtgVa9TzDuAc4L7R5LzWcxo4Wh/uWrjFmkerOL+Y52v5wgJ1aDFN4xwRW4FvAkcyc64xkepEkzjXuUFXA78BHoyIXRFxW51P8u7auaJ+tNqXH6d0nq1ExPbRFRGxQum5fcDkfN1bxjl+IR5MZndZLR+esv4PtVxZUj3qxjLic1Mtf7ZAHVpM6zh/g3Jc/dgijVJzreL81pHX31WXLwFfBn4BPBARr12gnZpfkxhnGVZwC2U/PhYR346IL0TEIcqwxAeBfQ3aq34NPgc7q68PXsfOr+UzU9avPv/yJdWjbnQan4i4FXgvZRzrwXnqUBPN4hwRN1EmeH8oMx9r0Da10yrOW2v5QeBJyqTBO4GLgM9QhjXdERE7MvO/8zdXc2i2L2fm4Yj4K/B9YPSuPI9Rhqw57HT9G3wOZg96e1HLRQf3t6pH3Zg7PhHxAeAAZZzj3sx8fo23qD9nFOeI2EaJ6eHM/GHHbVJ7Z7o/v2SkvDkzb8/Mf2TmI8CNlKEvK8DebpqpBZzxMTsiPkK5InIvZWLollreCXwN+EFHbdRw9J6DmaDPbvVX1flT1r9s7HVd16NudBKfiNhDObg/Dux0AnDvWsX5IOWuDx9v0Sg11yrOT9XyP8BPR1fUoRE/rg/fNmsDtbAmMa7jzA9ShrLckJknMvO5zDxBuUJyDNhXJydq/Rp8DmaC3Ox/7QAAAzFJREFUPruHajltXNLqpJJp45pa16NuNI9PROwDDlMuk74zMx9a4y3qXqs4v5ky/OGJ+qcZGRFJuRwO8On63JHFmqs5tT5uPzs+saxaTeDPmaFtaqNVjHdTbrV494TJg6eBe+rDt8zTSA3G4HMwx6DP7pe13B0RmybcmucqSk/a/WvUc3993VURcd6E2yzuHvs8LVerOK++53rgEPAXYJc954PRKs6HKJfBx20HrqHMNTgG/HbhFmsereJ8nDL2/MKIuHjCXIPX1/LRxZusGbWK8dm1vGjK+tXnnWOwvjU9x3fBHvQZ1bGGRyn3KL9lbPXngHOBQ6P3zYyIyyPi8rF6/gl8p77+s2P13Frr/7mJXD9axbk+fyMl1n8CrjGmw9Fwf/5EZt48vvD/HvQ76nNf72xjNFXDOL9A+YM5gC+O3lYxInYA+ym3Tf1R403QGhoes++t5XUR8YbRFRHxJuA6yrjku9q1Xl2JiM01zpeOPj/P92XZ/KOiOUz4e9jfA2+n3LP8YeDK0Xuk1kvdZGaM1fOKWs8KZWf/NWUiyrWUMcpX1i+RetAizhGxizLZaBNlXOOfJ3zU05l5oKPN0Bpa7c9T6t6Pf1Q0CA2P21sokwWvoFwR+RWlV3UvZWjLpzLzKx1vjiZoGOODwEcpveS3Aycpidwe4KXAgcz8ZMeboynqXK499eElwHsod9ZZ/XH1ZGbeVl+7DfgjcDIzt43VM9P3Zela/SXpRluAV1NOvKcoO/FJ4KvABRNem9Q5RBPWXVDfd7LWc4qSyL2q7210WTzOlB61XGN5tO/t3OhLq/15wmtX4//5vrfRpelxewvlyucJyoTRZyg/xN/X9zZu9KVFjCl38NhP+fH1FOWqyN8pP8w+3Pc2bvSl7ntndE6l/LCaep6d5fuy7MUedEmSJGlAHIMuSZIkDYgJuiRJkjQgJuiSJEnSgJigS5IkSQNigi5JkiQNiAm6JEmSNCAm6JIkSdKAmKBLkiRJA2KCLkmSJA2ICbokSZI0ICbokiRJ0oCYoEuSJEkDYoIuSZIkDYgJuiRJkjQgJuiSJEnSgJigS5IkSQNigi5JkiQNyP8A7+06Fj99dp4AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "image/png": {
       "height": 248,
       "width": 372
      },
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# avgAUC calculation\n",
    "vg = valid1.groupby(['Coupon_id'])\n",
    "aucs = []\n",
    "for i in vg:\n",
    "    #print(i)      # 这时候元组， 元组的[0]位置是被分组的元组值， 元组的[1]位置是按元组值分组之后的DataFrame\n",
    "    tmpdf = i[1]   # 这是DataFrame\n",
    "    if len(tmpdf['label'].unique()) != 2:\n",
    "        continue\n",
    "    fpr, tpr, thresholds = roc_curve(tmpdf['label'], tmpdf['pred_prob'], pos_label=1)\n",
    "    aucs.append(auc(fpr, tpr))\n",
    "print(\"平均AUC值: \",np.average(aucs))\n",
    "\n",
    "plt.plot(fpr,tpr,marker = 'o')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  6.6 测试模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>User_id</th>\n",
       "      <th>Coupon_id</th>\n",
       "      <th>Date_received</th>\n",
       "      <th>Probability</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4129537</td>\n",
       "      <td>9983</td>\n",
       "      <td>20160712</td>\n",
       "      <td>0.103367</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>6949378</td>\n",
       "      <td>3429</td>\n",
       "      <td>20160706</td>\n",
       "      <td>0.149315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2166529</td>\n",
       "      <td>6928</td>\n",
       "      <td>20160727</td>\n",
       "      <td>0.005448</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2166529</td>\n",
       "      <td>1808</td>\n",
       "      <td>20160727</td>\n",
       "      <td>0.018104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>6172162</td>\n",
       "      <td>6500</td>\n",
       "      <td>20160708</td>\n",
       "      <td>0.065901</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   User_id  Coupon_id  Date_received  Probability\n",
       "0  4129537       9983       20160712     0.103367\n",
       "1  6949378       3429       20160706     0.149315\n",
       "2  2166529       6928       20160727     0.005448\n",
       "3  2166529       1808       20160727     0.018104\n",
       "4  6172162       6500       20160708     0.065901"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# test prediction for submission\n",
    "y_test_pred = model.predict_proba(dftest[predictors])\n",
    "dftest1 = dftest[['User_id','Coupon_id','Date_received']].copy()\n",
    "dftest1['Probability'] = y_test_pred[:,1]\n",
    "dftest1.to_csv('results/submit1.csv', index=False, header=False)\n",
    "dftest1.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7. 保存模型 & 导入模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.path.isfile('SavedModel/1_model.pkl'):\n",
    "    with open('SavedModel/1_model.pkl', 'wb') as f:\n",
    "        pickle.dump(model, f)\n",
    "else:\n",
    "    with open('1_model.pkl', 'rb') as f:\n",
    "        model = pickle.load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  8 优化模型的思路\n",
    "- **特征工程**\n",
    "\n",
    "- **机器学习算法**\n",
    "\n",
    "- **模型集成**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9 参考代码\n",
    ">[比赛第一名代码与解析](https://github.com/wepe/O2O-Coupon-Usage-Forecast)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 10 知识点大总结\n",
    "> 这里主要是记录一下在完成这个项目中的一些知识点的整理\n",
    ">> * unique()   使用方式： np.unique(参数) 返回参数数组中所有不同的值，并按照从小到大排序,注意这个unique函数pandas里面也有， 也是返回唯一值, 不排序。 调用方式test.unique()。pandas里面还有一个nunique函数，这个是返回唯一值的个数。\n",
    ">> * read_csv   使用方式： pd.read_csv(文件所在路径) 如果文件中有缺失值，会自动的标注成NAN， 如果不想，在设定keep_default_na = false\n",
    ">> * DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)\n",
    ">> * get_dummies 是利用pandas实现one-hot-encode的方式 https://blog.csdn.net/maymay_/article/details/80198468\n",
    ">> * [python中datetime.date类介绍](https://blog.csdn.net/sunjinjuan/article/details/79080068)\n",
    ">> * pd.to_datetime() 获取指定的时间和日期\n",
    ">> * [SGDClassifier梯度下降分类方法](https://blog.csdn.net/WxyangID/article/details/80365779)\n",
    ">>> 这个分类器跟其他线性分类器差不多，只是它用的是mini-batch来做梯度下降，在处理大数据的情况下收敛更快。对于特别大的数据还是优先使用SGDClassifier，其他的线性可能很慢或者直接跑不动\n",
    ">> * [sklearn中predict_proba用法（注意和predict的区别）](https://blog.csdn.net/m0_37870649/article/details/79549142)\n",
    ">> * [利用sklearn做ROC曲线](https://www.jianshu.com/p/1da84ac7ff03)\n",
    ">>> * ROC曲线的全称是“受试者工作特性”曲线（Receiver Operating Characteristic），源于二战中用于敌机检测的雷达信号分析技术。是反映敏感性和特异性的综合指标。它通过将连续变量设定出多个不同的临界值，从而计算出一系列敏感性和特异性，再以敏感性为纵坐标、（1-特异性）为横坐标绘制成曲线，曲线下面积越大，判别的准确性越高。在ROC曲线上，最靠近坐标图左上方的点为敏感性和特异性均较高的临界值。\n",
    ">>> * 如何做出ROC曲线？\n",
    ">>>> 根据机器学习中分类器的预测得分对样例进行排序，按照顺序逐个把样本作为正例进行预测，计算出FPR和TPR。分别以FPR、TPR为横纵坐标作图即可得到ROC曲线。所以作ROC曲线时，需要先求出FPR和TPR。这两个变量的定义：<br>\n",
    "FPR = TP/(TP+FN)<br>\n",
    "TPR = TP/(TP+FP)\n",
    ">>>> TP、FN、FP的定义见下表，表中描述了是一个二分类问题的混淆矩阵：\n",
    "![](https://img-blog.csdn.net/20150215181403168)\n",
    ">>>> TP：正确肯定——实际是正例，识别为正例<br>\n",
    ">>>> FN：错误否定（漏报）——实际是正例，却识别成了负例<br>\n",
    ">>>> FP：错误肯定（误报）——实际是负例，却识别成了正例<br>\n",
    ">>>> TN：正确否定——实际是负例，识别为负例<br>\n",
    "![](https://img-blog.csdn.net/20150215181358147)\n",
    ">\n",
    ">\n",
    "> 下面是一些例子，便于理解："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  1. pd.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 2 4 5 3]\n",
      "[1 2 3 4 5]\n",
      "5\n"
     ]
    }
   ],
   "source": [
    "# pd.unique()#\n",
    "\n",
    "a = pd.Series([1, 2, 1, 4, 5, 5 ,3,3])\n",
    "print(a.unique())    # [1 2 4 5 3]\n",
    "b = np.unique(a)\n",
    "print(b)     # [1 2 3 4 5]\n",
    "print(a.nunique())    # 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  2. pd.get_dummies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   color_blue  color_green  color_red  class_A  class_B\n",
      "0           0            1          0        1        0\n",
      "1           0            0          1        0        1\n",
      "2           1            0          0        1        0\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color_blue</th>\n",
       "      <th>color_green</th>\n",
       "      <th>color_red</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   color_blue  color_green  color_red\n",
       "0           0            1          0\n",
       "1           0            0          1\n",
       "2           1            0          0"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pd.get_dummies\n",
    "\n",
    "df = pd.DataFrame([  \n",
    "            ['green' , 'A'],   \n",
    "            ['red'   , 'B'],   \n",
    "            ['blue'  , 'A']])  \n",
    "\n",
    "df.columns = ['color',  'class'] \n",
    "print(pd.get_dummies(df))   # 横着看， 变成了独热编码，类似sklearn里面的OneHotEncoder\n",
    " \n",
    "pd.get_dummies(df['color'], prefix='color')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  3. date.weekday()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2019-09-20\n",
      "今天是一周中的第 4 天\n",
      "今天星期：5\n"
     ]
    }
   ],
   "source": [
    "# date类是一个日期类，由年、月、日组成。由于Python也是面向对象编程语言，所以针对类，就会有构造函数，该类的构造函数如下： \n",
    "#class datetime.date(year, month, day): \n",
    "\n",
    "today_date = date.today()\n",
    "print(today_date)\n",
    "week = date.weekday(today_date)   #  周一是第0天，周二是第1天。。。。\n",
    "print(\"今天是一周中的第 {} 天\".format(week))\n",
    "week1 = date.isoweekday(today_date)\n",
    "print(\"今天星期：{}\".format(week1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> pd.to_datetime() <br>\n",
    "> &emsp;函数原型pandas.to_datetime（arg，errors ='raise'，utc = None，format = None，unit = None ）\n",
    "> ![](https://img2018.cnblogs.com/blog/1252882/201904/1252882-20190408153601168-8730041.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取指定的时间和日期\n",
    "# 当数据很多，且日期格式不标准时的时候，可以使用to_datetime，将DataFrame中的时间转换成统一标准。\n",
    "# df['date_formatted']=pd.to_datetime(df['date'],format='%Y-%m-%d')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  sklearn中predict_proba用法\n",
    "> predict_proba返回的是一个 n 行 k 列的数组， 第 i 行 第 j 列上的数值是模型预测 第 i 个预测样本为某个标签的概率，并且每一行的概率和为1。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2 3 2]\n",
      "[[0.75922437 0.24077563]\n",
      " [0.39896352 0.60103648]\n",
      " [0.71389191 0.28610809]]\n"
     ]
    }
   ],
   "source": [
    "x_train = np.array([[1,2,3], [1,3,4], [2,1,2], [4,5,6], [3,5,3], [1,7,2]])\n",
    "y_train = np.array([3, 3, 2, 2, 2, 2])\n",
    "\n",
    "x_test = np.array([[2,2,2], [3,2,6], [1,7,4]])\n",
    "\n",
    "clf = LogisticRegression()\n",
    "clf.fit(x_train, y_train)\n",
    "\n",
    "# 返回预测标签  \n",
    "print(clf.predict(x_test))\n",
    "\n",
    "# 返回预测属于某标签的概率  \n",
    "print(clf.predict_proba(x_test))\n",
    "\n",
    "# [2 3 2]  \n",
    "# [[0.56651809 0.43348191]  \n",
    "#  [0.15598162 0.84401838]  \n",
    "#  [0.86852502 0.13147498]]  \n",
    "# 分析结果：  \n",
    "# 预测[2,2,2]的标签是2的概率为0.56651809，3的概率为0.43348191  \n",
    "#  \n",
    "# 预测[3,2,6]的标签是2的概率为0.15598162，3的概率为0.84401838  \n",
    "#  \n",
    "# 预测[1,7,4]的标签是2的概率为0.86852502，3的概率为0.13147498  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  5. groupby()函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  key1 key2     data1     data2\n",
      "0    a  one -1.483122 -0.577277\n",
      "1    a  two  1.623217 -0.401235\n",
      "2    b  one -1.554287 -0.200688\n",
      "3    b  two -0.334798  0.308388\n",
      "4    a  one  0.676731  2.197392\n",
      "\n",
      "key1\n",
      "a    0.272275\n",
      "b   -0.944542\n",
      "Name: data1, dtype: float64\n",
      "\n",
      "\n",
      "i[1]: \n",
      "   key1 key2     data1     data2\n",
      "0    a  one -1.483122 -0.577277\n",
      "1    a  two  1.623217 -0.401235\n",
      "4    a  one  0.676731  2.197392\n",
      "i: \n",
      " ('a',   key1 key2     data1     data2\n",
      "0    a  one -1.483122 -0.577277\n",
      "1    a  two  1.623217 -0.401235\n",
      "4    a  one  0.676731  2.197392)\n",
      "i[1]: \n",
      "   key1 key2     data1     data2\n",
      "2    b  one -1.554287 -0.200688\n",
      "3    b  two -0.334798  0.308388\n",
      "i: \n",
      " ('b',   key1 key2     data1     data2\n",
      "2    b  one -1.554287 -0.200688\n",
      "3    b  two -0.334798  0.308388)\n"
     ]
    }
   ],
   "source": [
    "df = pd.DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'],\n",
    "                     'key2': ['one', 'two', 'one', 'two', 'one'],\n",
    "                     'data1': np.random.randn(5),\n",
    "                     'data2': np.random.randn(5)})\n",
    "print(df)\n",
    "\n",
    "# 将key1作为分组键值，对data1进行分组，再求每组的均值\n",
    "print()\n",
    "grouped = df['data1'].groupby(df['key1']).mean()\n",
    "print(grouped)\n",
    "\n",
    "# 打印输出分组结果，分组结果类型为元祖\n",
    "print(\"\\n\")\n",
    "grouped = df.groupby(df['key1'])\n",
    "for i in grouped:\n",
    "    print(\"i[1]: \\n\", i[1])\n",
    "    print(\"i: \\n\",i)      #  这是个元组，元组的0位置是分组的键， 后面是分开后的DataFrame\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#####  上述分组都是按行分组的情况，下面阐述按列分组的情况："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        key1  key2     data1     data2\n",
      "joe        1    10 -0.142260 -1.564753\n",
      "steve      2    20 -0.198841 -0.234167\n",
      "wes        3    30  1.318651  0.772309\n",
      "jim        4    40 -0.978296 -0.026368\n",
      "travis     5    50 -0.085436  2.034636\n",
      "            blue   red\n",
      "joe    -0.853507   5.5\n",
      "steve  -0.216504  11.0\n",
      "wes     1.045480  16.5\n",
      "jim    -0.502332  22.0\n",
      "travis  0.974600  27.5\n"
     ]
    }
   ],
   "source": [
    "df = pd.DataFrame({'key1': [1, 2, 3, 4, 5],\n",
    "                       'key2': [10, 20, 30, 40, 50],\n",
    "                       'data1': np.random.randn(5),\n",
    "                       'data2': np.random.randn(5)},index=['joe','steve','wes','jim','travis'])\n",
    "print(df)\n",
    "\n",
    "# 按列分组\n",
    "groupBy = {'key1': 'red', 'key2': 'red', 'data1': 'blue',\n",
    "               'data2': 'blue'}            #  这是在自定义分组呢\n",
    "grouped = df.groupby(groupBy, axis=1).mean()\n",
    "print(grouped)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
